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1 Introduction 


Parametric regression models, which take specifically parametric forms 
for the regression relationship between response and predictors, are com¬ 
monly used and studied in practice. This is due to their well established 
theories, easy implementation and interpretation. However, when the di¬ 
mension of the predictor vector is high, even moderate, it is difficult to 
correctly specify the regression form. People also concern whether a specifi¬ 
cally parametric regression can fit the data adequately. Therefore, it is nec¬ 
essary to conduct model checking to determine a suitable regression model 
before any further statistical analysis. To avoid model mis-specification, 
nonparametric regression models are proposed and investigated. However, 
under this case, the nonparametric estimation is usually inaccurate, which 
is documented as ‘curse of dimensionality’. 

Consider the following parametric single-index model 

Y = g((5 T X,8) + e, (1.1) 

where Y is the response with the covariate X S M p , /3 and 6 are the param¬ 
eter vectors of dimensions p and d, respectively and E(e\X) = 0. Besides, 
g(-) is a known link function and the superscript T denotes transposition. 
The nonparametric regression model takes the following form: 

Y = m(X)+e, (1.2) 

where m{-) is the unknown mean function and E{e\X) = 0. There exist sev¬ 
eral proposals available in the literature to test model m against model 
(USD- To the best of our knowledge, all of existing tests can usually be 
classified into two categories: local smoothing methods and global smooth¬ 
ing methods with their respective advantages and disadvantages. The for¬ 
mer generally relies on nonparametric regression estimators and the latter 
class involves empirical processes. As commented in Guo et al. (2014), the 
former is more sensitive to high-frequency alternative models while the lat¬ 
ter may be in favor of smooth alternatives. On the other hand, it is well 
known that the former severely suffers from curse of dimensionality since 
the typical convergence rate is only 0(n _1 / 2 /i _p / 4 ), which is very slow in 
high-dimensional situation. Due to the data sparseness in high-dimensional 
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space, the power performance of the latter often drops down very signif¬ 
icantly. Further, for these classical tests, compute-intensive re-sampling 
technique or Monte Carlo approximation are usually needed to help critical 
value determination or p-value computation. 

Varieties of methods can be included into the first class. Among oth¬ 
ers, the quadratic conditional moment-based test was proposed by Zheng 
(1996), the minimum distance test developed by Koul and Ni (2004) and 
the distribution distance test introduced by Van Keilegom et al (2008). See 
also Hardle and Marnmen (1993), Dette (1999) and Zhang and Dette (2004). 
Turn to the latter category. A test that is based on residual-marked em¬ 
pirical process was presented by Stute (1997). An innovation approach for 
the above empirical process was suggested by Stute et. al (1998b). A rele¬ 
vant reference for generalized linear models is Stute and Zhu (2002). Koul 
and Stute (1999) studied a class of tests for an autoregressive model, which 
are based on empirical processes marked by certain residuals. Stute et. al 
(1998a) recommended the wild bootstrap to approximate the limiting null 
distribution of the empirical process-based test statistic. For a comprehen¬ 
sive review, see Gonzalez-Manteiga and Crujeiras (2013). 

Inspired by the classical likelihood ratio test, Fan et al. (2001) intro¬ 
duced a nonparametric generalized likelihood ratio (NGLR) test for the 
above hypothetical model in (11.11) versus the alternative model in m ■ 
A significant property of NGLR is that the limiting null distribution does 
not depend on nuisance functions, exhibiting what is known as Wilks phe¬ 
nomenon. This test has been applied in many different situations such as the 
varying-coefficient partially linear models (Fan and Huang, 2005) and par¬ 
tially linear additive models (Jiang et. al, 2007). For more details, see Fan 
and Jiang (2007). However, similar to other local smoothing tests, NGLR 
also suffers from the curse of dimensionality because of the inevitable use of 
multivariate nonparametric function estimation. To be precise, under the 
null hypothesis, the NGLR test statistic converges to its limit at the rate 
of order 0(n -1 / 2 /j -p / 4 ) and can only detect alternatives distinct from the 
null hypothesis at the rate of order 0(n~ 1 / 2 /i _p / 4 ). Further, its limit has 
a bias term and thus, as Fan and Li (1997) and the later research work in 
this field pointed out, the bias term causes the great difficulty of controlling 
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type I error. Also, as Dette and von Lieres und Wilkau (2001) pointed out, 
for finite sample sizes, the bias converging to infinity has to be taken into 
account. See also Zhang and Dette (2004). To determine critical values, 
resampling/Monte Carlo approximation such as the bootstrap method is re¬ 
quired even the Wilk’s phenomenon holds. But the bootstrap approximation 
is computationally intensive and even though type I error can not well be 
controlled when the dimension of predictors is high. We will see this in the 
simulation section below. Second, when the dimension p is even moderate 
the power can be very low. In the simulation section, we will show that the 
power drops down significantly with increasing the dimension. Obviously, 
it is desirable to have a test that can well control type I error without the 
assistance of resampling approximation and more importantly, can handle 
high-dimensional data with acceptable performance. 

Note that for the parametric single-index model m, the regression 
relationship of the response Y depends on the predictors X only through 
the one-dimensional index /3 T X. In other words, /3 T X can capture all the 
regression information on the response. Thus, the model (11.11) can be viewed 
as a model with a dimension-reduction structure. Motivated by this feature, 
Stute and Zhu (2002) proposed a test that is fully based on /3 T X for the 
model (11.11) as follows: 

l n 
* 2=1 

where /3 and 0 are, under the null hypothesis, two respective root-n con¬ 
sistent estimates of j3 and 0, and /(•) is the indicator function. The above 
test statistic can greatly avoid the dimensionality. But it is obviously a 
directional test that cannot detect general alternatives. Guo et al. (2015) 
discussed this problem in details and proposed a test that has the nature of 
dimension reduction and still is omnibus. 

In this paper, we try to simultaneously solve these two problems from 
which the original NGLR suffers by reducing both the bias dimensionality. 
To this end, we will propose a bias-correction based version of NGLR such 
that the bias can asymptotically be removed and combine the idea in Guo 
et al. (2015) with NGLR to circumvent the curse of dimensionality. The 
details are as follows. 
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(1.3) 


Consider a more general alternative model: 

Y = m(B T X) +e, 

where B is apx q orthonormal matrix with B T B = I q being a q dimensional 
identity matrix where q is unknown with 1 < q < p, m(-) is an unknown 
smooth function and E{e\X) = 0. Under the null hypothesis, the model 
(11.31) reduces to the model (11.11) with q = l,B = /?/||/3|I 2 and ra(-) = g(-). 
Here || • H 2 denotes the L 2 norm. Under the alternative hypothesis, the 
model (11.31) can be regarded as a multi-index model with q > 1 indices. If 
the alternative model is a purely nonparametric regression model as ra, 
q is equal to p. 

As we found that the bias comes from the slow convergence rate of 
nonparametric estimation in the residual sum of squares, we then consider a 
sum of residual product that are from parametric and nonparametric fits. It 
will be confirmed theoretically that the bias can be asymptotically removed. 
As a by-product, the bias-reduction-based NGLR statistic also has a smaller 
asymptotic variance. Through the simulation studies later, we can see its 
advantages on significance level maintainance and power enhancement. 

As for dimensionality reduction, the key is how to adapt the model struc¬ 
ture under the null hypothesis and under the alternative how the test can 
automatically adapt the model structure such that the test is omnibus. To 
achieve this goal, we need to automatically identify f3 and B, and then con¬ 
struct a test that is based on this model-adaptive identification to fully use 
the information provided by the underlying model. The estimation pro¬ 
cedure will be described in the next section, the model-adaptive version of 
NGLR does have two desired features: the Wilks phenomenon still holds and 
the test statistic can converge to the limit at the rate of order 0(n _1 / 2 /i^ 1 / 4 ) 
under the null hypothesis and can detect local alternatives distinct from the 
null at the rate of order (^(n -1 / 2 /! -1 / 4 ) rather than 0(n -1 / 2 /i -p / 4 ). 

The rest of this article is organized as follows. In Section 2, the model- 
adaptive enhancement versions of the NGLR tests without bias-correction 
and with bias-correction are both constructed. The methods to estimate 
B and identify the structure dimension q are also presented in this section. 
The asymptotic properties under the null hypothesis, local and global al¬ 
ternative hypothesis are investigated in Section 3. Simulation studies and a 
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real data analysis are conducted to evaluate the finite sample performance 
in Sections 4 and 5, respectively. All the technical proofs are relegated to 
Appendix. 


2 Test statistic construction and relevant proper¬ 
ties 

The hypotheses are: 

H 0 : E(Y\X) = g(fi T X, 9) for some /3 € R p , 9 € R d ; 

H x : E(Y\X) = m(B J X) ± g(/3 J X, 9), for any /? € M p , 9 G R d , 

2.1 Model-adaptive enhancement of the NGLR test 

Suppose ( yi,Xi ), i = 1 is a random sample and the error e, in 

nonparametric model (11.21) is i.i.d 1V(0, a 2 ). The normality is just used for 
motivating the test statistic construction. As Fan et al (2001) did, the 
conditional log-likelihood function of yi given Xi can be written as: 

n 

Km, a 2 ) = -^ln(2vro- 2 ) - ^ ^[yi ~ m(xi)} 2 . 

i= 1 

Under the null hypothesis Hq, 

10 , 0 , ^ 2 ) = ~ ln(27T(7 2 ) - ^2 RSS 0 . (2.1) 

Here RSSo is the residual sum of squares under the hypothetical model (II. ip . 
that is, 

n 

RSSo = Y J [y—90 T x i ,e)} 2 , (2.2) 

i= 1 

where /3 and 9 denote the ordinary least squares (OLS) estimates of the 
parameters /3 and 9, respectively. Further, maximizing the likelihood in (12.11) 
with respect to nuisance parameter a 2 yields a 2 = n _1 RSSo and substituting 
the estimate in (12.11) yields the following log-likelihood function: 

10 , 0 , a 2 ) = ~ ln(RSSo) - ^[1 + ln(2vr/n)]. (2.3) 
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Under the alternative hypothesis Hi, with a similar argument, we can obtain 
l(m,B(q),a 2 ) = -^ln(RSSi) - ^[1 + ln(27r/n)]. (2.4) 

Here B(q ) is an estimate of B. RSSi is the residual sum of squares under 
the alternative model (11.31) . that is, 

n 

RSSi = - MBiqVxi)} 2 . (2.5) 

i=l 


We will later specify the estimate m(B(q) T Xi) of m(B T Xi). 

Under the null hypothesis Hq, l(rh,B(q),a 2 ) should be close to 10, 6, a 2 ). 
While under the alternative hypothesis Hi, they should deviate from each 
other. This motivates us to define the following test statistic: 


T n = l(rh, B{q), a 2 ) - 10,6, a 2 ) 


n RSSo 

2 l0g RSS^ 


nfRSSa-RSS,) 

2- RSSi -’ (2 ' 6) 


From the above construction, it seems there is no difference from the 
original NGLR test except having an extra estimate of B. However, we will 
see that using an extra estimate B(q) of B will play a very important role 
in the model-adaption of the test. 

Once an estimate B(q) of B is available, m(B T Xi) can be estimated by: 


rh(B(q) T Xi) = '^2,w ij (B(q))y j . (2.7) 

j =i 


Here 


Wij{B(q)) 


JC{B(q) T (xj - xj)/h} 
E?=i £{B(q) T (xi - xi)/h} ’ 


with /C(-) being a g-dimensional kernel function and h being the bandwidth. 
Moreover JC(-) is a symmetric kernel with compact support, and is of order 
r > 2, that is, 


(-l) r 


v\ 



1 

< o 

k r > 0 


if j = 0, 

if 1 < j < r — 1, 
if j = r. 
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Now we can make the comparison with NGLR. The main difference is 
the definition of RSSi. In Fan at al. (2001), 

n 

RSSi = yZivi - M x i)] 2 - 

i =1 


Here rh(xi) = E”=i Kh{ x i - x j)yj/ E"=i Kh{ x i ~ x j), K h (-) = K(-/h)/h p , 
with K(-) a p-dimensional kernel function. Then NGLR is defined as 


- n RSS 0 n(RS S 0 -RSSi) 

= — log =- «-=- 

2 B RSSi 2 RSSi 


( 2 . 8 ) 


We will see this difference will make the test behave very different from 
the original NGLR. 


2.2 Bias-corrected version of the test statistic 


From Theorem |T] in Section 3 below, it can be seen clearly that there 
exists an asymptotic bias converging to infinity in the limiting null distri¬ 
bution, which has to be taken into account in practice. In this subsection, 
we aim to propose a bias-correction method to remove the asymptotic bias. 
The motivation is from the theoretical investigation for the test statistic. 
The asymptotic bias of T n in (12.6j) is caused by the slow convergence rate 
of m(B(q) J Xi) to m(B T Xi) in RSS\. To eliminate the bias asymptotically, 
we replace RSSi hi (12.51) by 


RSSi = X! y- Vi ~ ™(B(q) T x i)][yi ~ g0 T Xi,9)\ , 

i= 1 


(2.9) 


where, | • | denotes the absolute value and the leave-one-out kernel estimate 
of m(B T Xi) is applied. To be precise, 


fh(B(q) T Xi) 


E K,{B(g) T (xi - Xj)/h} yj 
- x i)/h} 


=: '^w ll {B{q))y 1 , ( 2 . 10 ) 


which is different from the original kernel estimate rh(B(q) T x{) in (12.71) . It 
is clear that the term RSSi represents the sum of residual product under 
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the null and alternative hypothesis. The bias-corrected version of the test 
statistic T n in (|2.6I) is defined as: 


n RSS 0 - RSSi 
2 RSSi 



( 2 . 11 ) 


where e* = y* - g(/3 T Xi, 6). 


To understand the difference between NGLR and the bias-corrected 
NGLR under the null hypothesis, we can check mainly the difference be¬ 
tween two numerators of the two test statistics because the denominator 
goes to a constant in probability. Recall that the main reason to have the 
bias term is the slow convergence rate of the nonparametric estimation in 
RSSi to the regression function under the null hypothesis. The parametric 
fitting in RSSi has, under the null hypothesis, a faster convergence rate to 
the error than the nonparametric fitting, and then reduces the bias asymp¬ 
totically and at the same time, the another nonparametric fitting can still 
help the test detect the alternative. 

Bias correction was also investigated in Gozalo and Linton (2001) for 
additivity test motivated by Lagrange multiplier tests. However, there is no 
investigation for model checking. Further, unlike Gozalo and Linton (2001), 
a leave-one-out kernel estimate is applied which is proved to be useful since 
without this, the asymptotic bias still exists although it can be reduced. 
Further, we also use absolute summands in RSSi to avoid possible negative 
values. 

2.3 Identification and estimation of B 

Note that the model (11.31) is a multi-index model. Here, we first estimate 
the matrix B under the given q and then study how to select q consistently. 
To this aim, outer product of gradients (OPG) and minimum average vari¬ 
ance estimation (MAVE) introduced by Xia et al (2002) are adopted. OPG 
is easy to implement and MAVE possesses excellent performance in general. 
Review these two methods below. 
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2.3.1 Outer product gradients 


Denote m(B T x) = E(Y\X = x),Vm(B r X) = -^m(B T X) and m'{B T X) 
x m(B T X). Note that 

E{Xm(B T X)Xm(B T X) T } = BE{m' (B T X)rri {B T X) T }B T 

owns q nonzero eigenvalues. Therefore, B is in the space spanned by the 
q eigenvectors of E{Xm(B T X)Xm(B T X) T } corresponding to the largest q 
eigenvalues. Through the above analysis, the estimate of B can be obtained 
by estimating E{X7m(B T X)X7m(B T X) T }. 

To implement the estimation, we first estimate the gradients by the local 
linear smoother: 

m(B T Xi ) = m{B J Xj) + m! (B T Xj) T B T (x* — xj ) = aj + bj Xij, 

where a 3 = m(B T Xj), bj = B x m'(B T Xj) and = Xi — Xj. Then the esti¬ 
mate ( a,j , bj) can be obtained by solving the following minimization problem: 

n 

min ^ fch(B T Xij){yi - a 3 - b]x^} 2 , 
aj ’ bj i=i 

where = K.(-/h)/h with /C(-) being a g-dimensional kernel function 

and h being a bandwidth. The corresponding estimating equation can be 
rewritten as 

n 

^2 fch{B T Xij)(l,xJj) T {yi - aj - bjx^} = 0. 
i— 1 

The estimate of E{Vm(B T X)Xm(B T X) T } can be constructed as: 



j = 1 


Thus, the q eigenvectors that are associated with the largest q eigenvalues 
of E can be regarded as the estimate of the matrix B. 
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2.3.2 Minimum average variance estimation 

The minimum average variance estimation (MAVE) method was first 
proposed by Xia et al. (2002). This method needs no strong assumptions on 
the probabilistic structure on A. At the population level, MAVE minimizes 
the objective function 

E{Y-E(Y\B T X)} 2 = E (e[{Y - E(Y\B t X)} 2 \B t A]) subject to B T B = I q . 

This motivates an estimation procedure for cr B (B T X) = E[{Y—E(Y\B T X)} 2 \B T X]). 
By using the local linear smoother for E(Y\B T A), E[{Y—E(Y\B T X)} 2 \B T X = 

B T Xj ]) can be estimated by Yl?=i{Vi ~ a j ~ dj B T Xij) 2 ICh(B T Xij). Finally 
take the average over all j to get the estimated objective function. 

In sum, when a sample {(xi,?/i),..., ( x ni y n )} is available, we can esti¬ 
mate B through minimizing 

n n 

^2 - aj - dj B T x ij ) 2 JC h (B T x ij ) 

j =1 i= 1 

over all B satisfying B T B = I q , aj and dj = m'(B T Xj). The details of the 
algorithm can be found in Xia et al. (2002). 

2.4 Estimation of structural dimension q 

In the above subsection, the structural dimension q is assumed to be 
known. Now we introduce two techniques to estimate it: ridge-type ratio 
estimate (RRE) and Bayesian information criterion type estimate (BIC) 
that is motivated by Zhu et al (2006) and Wang and Yin (2008). 

Ridge-type Ratio Estimate(RRE). This estimate is in spirit similar to 
the one proposed by Xia et al (2014), but with a different ridge value. It is 
very simple and easy to implement. It is the eigenvalues’ ratio modified by 
adding a positive ridge value c. Determine q by 

-^fc+i + c 

q = arg mm —-, 

fc=l, 2 ,...,p-l A k + c 

where Ai > ... > X p are the eigenvalues of X in (|2.12j) . The consistency of q 
will be proved in the following and the constant c = 1 /y/nh is recommended. 
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BIC estimate. For MAVE, it is related to the residual sum of squares. 
The above simple ridge-type ratio estimate can not be used to determine q 
for the MAVE based estimate. Instead, we use the modified BIC developed 
by Wang and Yin (2008): 


q = arg min BICk = arg 

fc=l,2,...,p 


min log 

fc=l,2,...,p 


RSS k \ 

n J 


log (n)k 
min (nh k , y/n) ’ 


where RSS k is the residual sum of squares and k is the estimate of the 
dimension: 


RSS k = ]T - a j - djB(k) T Xij ) 2 JC h (B(k) T Xij ). 
j =1 i= 1 

Under some mild conditions, the consistency of q has been shown by Theo¬ 
rem 1 in Wang and Yin (2008). 


REMARK 1. The following consistency of q is established under both the 
null and global alternative hypothesis. Under Conditions (C5) and (Cl) 
in Appendix, with a probability going to one, the ridge-type ratio estimate 
q = q as n —>• oo. Under the regularity conditions designed in Wang and 
Yin (2008), the MAVE-based estimate q = q as n —>• oo. Therefore, for a 
q x q orthogonal matrix C, B(q) is a consistent estimate of BC T . We will 
show q = 1 under the local alternative hypothesis in Section Q 


3 Theoretical results 

In this section, the limiting null distributions of our proposed model- 
adaptive enhancement NGLR test T n in (|2.6I) and the bias-correction version 
T n in (12.1111 are both derived and their asymptotic properties under local 
and fixed alternative hypothesis are also investigated. 


3.1 Limiting null distribution 


Let Z = /3 t X and Define 

Qi = {/C(0) 


f K?{u)du' i f a 2 (z)dz 

2 If a 2 (z)f(z)dz’ 


(3.1) 


12 









(3.2) 


77 Q = 2 J cr A (z)dz j [2 JC(u) — K, * JC(u)] 2 du, 


Vo = 


do 


4 (f & 2 (z)f{z)dz ) 2 ’ 


(3.3) 


where the symbol * denotes the convolution operator and K * K{x ) = 
/ K(t)K(x — t)dt. Besides, & 2 (z) = E(e 2 \Z = z), a 4 (z) = [E{e 2 \Z = z )] 2 
and /(•) is the density function of Z. In order to obtain the estimates Q\ 
and Vo for Q\ and Vo, we first list the consistent estimates for the quantities 
L\ := f a 2 (z)f(z)dz, L 2 := J JC 2 (u)du f a 2 (z)dz, L 3 := /C(0) f <r 2 (z)dz and 
770 as follows: 

1 n n 

Li = ~^2^ 2 =-Y^[yi-MB(q) T Xi)} 2 , 

n r-f n f 
2—1 2=1 

n n 

h = '•EE ^(•B(g))[2/j - m(B(q) T Xj)} 2 , 

*=1 j=i 

n 

L 3 = h^Wii(B(q))[yi-rh(B(q) T Xi)] 2 , 

2=1 
n n 

Vo = 2 h ^ZYl^i -™(B(q) Tx i)] 2 [yj - rh{B(q) T Xj )] 2 

i=l iz/zj 

n 2 

X {*%(£(£)) + Wji(.B(£)) - ^t«fci(S(g))«; fcj (S(g))| . 

fc=i 


Therefore, we have 


Qi 


L 


■ (L-i 



Vo 



The following theorem states the limiting null distribution of the pro¬ 
posed test statistic T n in (12.61) . 


THEOREM 1. Under the null hypothesis in m> and conditions (Cl)- 
(Cl) in Appendix, we have: 

Vh{ Tn-^y) =>N(0,V o ). 
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Plugging in their consistent estimates and by Slutsky’s Theorem, the 
following corollary can be easily obtained. 

COROLLARY 1. Under the null hypothesis in hl.l\ ) and conditions (Cl)- 
(C7) in Appendix, we have: 



Theorem |j] and Corollary [T| characterize the asymptotic normality of the 
proposed test statistic. The null hypothesis Ho is rejected when \S n \ > 
Zl- a /2 that is the 1 — a/2 upper quantile of the distribution 1V(0,1). It is 
notable that with homoscedastic errors, the asymptotic bias and variance 
in Theorem Q] is free of any nuisance parameters and nuisance functions. To 
be precise, Qi = |fl| • {/C(0) — j K 2 {u)du/2) and Vo = \Cl/2\ f[2/C(u) — 1C * 
JC(u)] 2 du, here |fi| denotes the Lebesgue’s measure of the support Z. Thus, 
similar to NGLR, the limiting null distribution of the proposed test statis¬ 
tic is free of nuisance parameters enjoying Wilks phenomenon. Note that 
n~ 1 (RSSo — RSSi) = O p (y/h ) under Hq. Thus Theorem [[] and CorollaryQ] 
show that n~ 1 (RSSo — RSS i) = O v {\fh ) under Ho and the asymptotic bias 
of T n is at the order of /i" 1 / 2 . However, for NGLR, n~ l (RSSo — RSS\) = 
O p (h p / 2 ) and the asymptotic bias has the order of h ~ p / 2 . Thus, the con¬ 
vergence rate is greatly improved, the dimensionality effect is significantly 
eliminated and the bias is much reduced. These results make that the pro¬ 
posed test controls the size better compared with the NGLR test. The finite 
sample simulations later also confirm this claim. 

We then state the asymptotic property of the bias-correction version test 
statistic T n in (12.111) under the null hypothesis. 

THEOREM 2. Under the same conditions as those in Theorem. [TJ, we have 

VhT n ^N( 0,Hi), 


where 


f a 4 (z)dz f K 2 {u)du 
' 2(f a 2 (z)f(z)dz) 2 


(3.5) 
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Compared with Theorem [H the significant difference is that there is 
no asymptotic bias in the limiting null distribution of T n . Another no¬ 
table feature of T n is that the asymptotic variance is also reduced since 
f K 2 (u)du < J[2K(u) — K * K(u)] 2 du. For a formal proof of this inequal¬ 
ity, see, e.g, Dette and von Lieres und Wilkau (2001). We will explore 
this point further in the power performance studies. As for the situation 
of homoscedastic errors, we have V\ = |fi|J K 2 (u)du/ 2. Here |H| is also 
the Lebesgue’s measure of the support Z. The above theorem shows that 
under (and only under) conditional homoskedasticity, V\ does not depend 
on nuisance parameters and nuisance functions. In this case, the bias- 
correction version, like the NGLR statistic T n in (12.81) . also enjoys the 
Wilks phenomenon. This offers a great convenience in implementing the 
bias-correction version. As the original NGLR test, we need to estimate Vj 
to obtain a standardized version of T n . To this end, notice that 


Vi = 


n ~ l £"=1 \ei[yi - rh{B{q) T Xi ) 


Vi, 


here G = Hi — g(/3 T x*, 9) and Wij(B(q)) has been defined in (12.101) . 

We now standardize T n in (12.111) to get a scale-invariant statistic. Ac¬ 
cording to Theorem [2l the standardized version of T n is 


VhT n _ YJl= i N{|e»| ~ 1 Vi - m{B(g) T Xi) |} 


By the consistency of Vi, an application of the Slutsky’s Theorem yields the 
following corollary. 


COROLLARY 2. Under the conditions in Theorem m and Hq, we have 

\fhf n 


Rn. ■ = 




A(0.1), 


where A(0,1) is the standard normal distribution. 


From this corollary, we can calculate p- values easily by using its limiting 
null distribution. 
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3.2 Power study 

Now we are in the position to study the power performance of the test 
under a sequence of local alternative models with the following forms 

H ln : Y = g(p T X, 9) + C n m(B T X) + e. (3.7) 

Here, E(e\X) = 0, E[m 2 (B J X)\ < oo and {C n } is a constant sequence. 
Denote a = (/3,9) to be the minimizer 

a = argminE{<7(/3 T X, 9) — m(X)} 2 

with m(X) = E(Y\X). Under the null hypothesis, a is the true parameter. 
Further, for the least square estimate 9, we always have a — a = O p (l/y/n). 
We state the consistency of q under the local alternative hypothesis (13.711 . 


LEMMA 1. Suppose that conditions (Cl)-(C7) in Appendix hold, under 
the local alternative with C n = n l t 2 h x / 4 , and under global alterna¬ 
tive either the OPG-based or the MAVE-based estimate q respectively 

equals 1 or q with a probability going to one as n —>• oo. 


To state the following theorem, we first define the notation as 


E[m 2 (B T X)\ 
Ml 2 J a 2 (z)f(z)dz' 


(3.8) 


The asymptotic properties under global and local alternative hypothesis are 
stated in the following Theorem: 


THEOREM 3. Given conditions (C1)-(C7) in Appendix, we have the fol¬ 
lowing results. 

(i) Under the global alternative of hi. 3 1) . 

Vh{T n - ^)/( nh 1/2 ) => C > 0, 


where C is a positive constant. 

(ii) Under the local alternative hypotheses in \3.1\) with C n = 
we have 

yfh(T n -QL)^N(n u V { 3 ), 
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and S n =P N(p i/'/Vo, 1). 

This theorem indicates under the global alternative hypothesis, the pro¬ 
posed test is consistent with the asymptotic power 1, the test can detect 
the local alternatives distinct from the null at the rate of order n" l ^ 2 h~ 1 ^ 4 
rather than the rate of order n~ 4 ^ 2 h~ p ^ 4 the NGLR can achieves. We further 
present the asymptotic properties of the bias-corrected version T n in ( 12 . 111 ) . 

THEOREM 4. Assume that conditions (Cl)-(Cl) in Appendix hold, we 
have the following conclusions. 

(i) Under the global alternative of 11 . 31 ) . 

Vhfn/inh 1 / 2 ) => C x > 0, 


where C\ is a positive constant. 

(ii) Under the local alternative hypotheses in \3. 7[ ) with C n = n" 1 / 2 /! -1 / 4 , 
we have 

VhT n => NfauV!), 


and R n => N(p 1 ), where pi, V\ have been defined in 13.31 ) and if 3. 51 ) . 


respectively. 

Denote 

Oi = 

Mi 

E{m 2 (B T X )} 

VW 

\J 2 J a 4 (x)dxj[2E(u) — K, * IC(u)} 2 du 

0 2 = 

Mi 

E{m 2 (B T X)} 

\fV\ 

\J“2 j cr 4 (x)dx f K?(u)du 


From the above theorems, it can be also shown that, the asymptotic powers 
of T n and T n are 1 — &(z a — O i) and 1 — &(z a — O 2 ) respectively for the 
alternatives, which are distinct from the null ones at rate n^ 1 / 2 /! -1 / 4 . Since 
Vi < Vo> it is evident that T n is more powerful than T n in theoretically. 
Thus, T n is asymptotically more efficient than T n under H\ n . 
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REMARK 2. Both of our preliminary simulation results based on S n in 
the formula \3.f\) and bias-correction statistic R n in \3.6\) show the inflation 
sizes of the tests. Therefore, a size-adjustment is adopted: 

S n = - 71 . and R n =- — , . 

1 + 4n -4 / 5 1 + 4n -4 / 5 

Note that the size-adjustment value is asymptotically negligible when n —>• oo 
and thus S n —> S n ,R n —> R n . The size-adjustment is selected via intensive 
simulation with many different values we conduct and this one is worthy of a 
recommendation. After such an adjustment, the new tests S n ,R n can much 
better control type I errors and enhance the powers than those without the 
size-adjustment. 


4 Simulation study 

Denote the adjusted test statistics in the formula 113.9p based on OPG and 
MAVE methods as S GPG and Sn IAVE , respectively and the corresponding 
bias-corrected versions are denoted as R GPG and R AJAl E ■ In this section, 
three numerical studies are carried out to examine the finite sample perfor¬ 
mance of the proposed tests. The first study is used to examine and compare 
the performance among our proposed four tests. The effect of nonlinearity 
under the null hypothesis on the performance of the tests is also considered 
here. The objective of the second study is to examine how much improve¬ 
ment our method can make compared with the NGLR test T^ zz proposed 
by Fan et al (2001) and the impact of correlation between the covariate X 
is also discussed in this study. Since the test Tff wz proposed by Guo et 
al (2015) is also to solve the dimensionality problem in model checking, the 
last study aims at comparing our tests with T/f wz . 

Study T. The data are generated from the following models: 

H \i : Y = ft T X + a\ exp(—0.1/3 T A) + e and ft = (1,.... 1,0, 0 ) T /y/p — 2, 
H n -.Y = ft T X + 1.25ai x 2~^ x + e and ft = (0,..., 0,1,..., l) T 

p/2 p/2 
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iii 3 : Y = p T X + a-i cos(0.67r/3 T X) + e and f3 = (1,, 1) T /y/p, 

Hu : Y = 1.5exp(0.5/3 T A) + a .2 cos(0.67r/3 T A) + e and /3 = (1,..., 1) T /y/p, 


where p = 8. The covariate X = (X\,X 2 , ■ ■ ■ ,X p ) T are i.i.d. and generated 
from a multivariate normal distribution N( 0, I p ) where I p is a p x p identity 
matrix. The residual e ~ N( 0,1) and e ~ f(5) are considered. Here, we set 
oi = 0,0.1,... , 0.5 and 02 = 0,0.2,..., 1.0. In these two models, aj = 0, z = 
1,2 corresponds to the null hypothesis and a* / 0, z = 1, 2 to the alternative 
hypotheses. Under the alternatives, the last two models are high-frequent 
and the first two are not. We examine whether our test can be powerful for 
these two types of models. Throughout these simulations, unless otherwise 
specified, the kernel function is taken to be /C (u) = 15/16(1 — it 2 ) 2 if |zz| < 1 
and 0, otherwise. The bandwidth is selected as h = 1.5The 
sample sizes n = 100 and 200 are considered and the significance level is set 
to be a = 0.05. Every simulation result is the average of 2000 replications. 
Empirical sizes (type I errors) and simulated powers of our test against 
the alternatives Hu, i = 1,2,3,4 are tabulated in Tableland Table [2] 

to compare the performance of the four test statistics S^ PG , R GPG and 
gMAVE emave 

Based on TablcQ] we can obtain the following observations. First and the 
most important, for every combination of the random error e and the sample 
sizes we conduct, both of the size-adjustment bias-correction versions R GPG 
and R pia ^ e can very well control empirical sizes that are very close to the 
pre-specified significance level 0.05. However, the test statistics S GPG and 
E without bias-correction can not always make excellent performance 
although size-adjustment is done. It is worth mentioning that the empirical 
sizes of them present too large or too small which can not be adjusted further. 
Second, with increasing of a, the more deviation of the alternative hypothesis 
is from the null hypothesis, the higher the simulated powers are. Also, it 
is reasonable that the empirical powers of these tests are higher with larger 
sample sizes. Compared the bias-correction version R GPG (or R^ IAVE ) and 
no bias-correction version S GPG (or S AIAVE ), we can see that in most cases, 
the bias-correction version R GPG (or R ArA ^ E ) owns more powerful powers 
but they are still comparable. As for OPG-based test S GPG (or R GPG ) and 
MAVE-based test S AIAVE (or R^ IAVE ), in most situations, MAVE-based 
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test generally has slightly higher empirical powers than OPG-based test. 
However, the differences can be negligible, that is, the simulated powers of 
OPG-based test and MAVE-based test are also comparable. Third, under 
the same alternative hypothesis, the distribution of random error makes no 
significant influence and their’s empirical sizes and powers are all acceptable, 
which suggest that our tests are robust. The similar conclusions can be made 
based on Tableland thus we omit them. 

In summary, the above conclusions indicate that the bias-correction is 
necessary. In the following simulation, we only report the results of OPG- 
based bias-correction version R GPG for its simplification, light computa¬ 
tional burden and comparable performance with R pfAVE . 

Table Q] and Table [2] about here 

Study 2: Consider the following regression models 

H 21 :Y = fij X + ai(/3j X) 2 + e 
H 22 :Y = f3jX + a 2 |/3 2 T X| 1/2 + e, 

where p = 4 and ft = (1,1,..., l) T /y/p, fc = (0, ■ - , 0,1, • • •, 1) T / 

p/2 

thus when a* ^ 0, i = 1,2, we have q = 2 and B = (/ 3\,j3 2 ). The covariate 
X = (Xi,...,X p ) T are generated from multivariate normal distribution 
N(0, Efe), k = 1,2 with Ei = I p and S 2 = {0.2l* _J l } pX p. The random 
error is e ~ AT(0, 2.56 2 ) and t( 5), which is used to examine the effect of 
the heavy tailed error on the performance of our proposed test statistics. 
Denote a± = 0,0.2,... , 1.0 in H 2 i and a 2 = 0,0.3,... , 1.5. Empirical sizes 
and powers for H 2 \, H 22 withp = 4 and n = 100, 200 are displayed in Table[3] 
to compare the results of R GPG and T pzz . For the NGLR test, just as Hong 
and Lee (2013) mentioned, the asymptotic normal approximation might not 
perform well in finite sample cases due to its slow converge rate to its limiting 
null distribution. Thus for T pzz , except for the asymptotic method T pzz . 
we also apply the conditional bootstrap procedure T P ^ Z developed by Hong 
and Lee (2013) to determine critical values. 

The following findings can be obtained from Table [2J First, we can see 
that the empirical sizes of both R GPG and bootstrapped version of T p g Z 
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can be under control and they are robust to various covariance matrix of X 
but the asymptotic version T pzz can not control sizes very well. Second, 
It is reasonable that the simulated powers for both the tests become higher 
with increasing of a and for the same combination of X, both the tests are 
more powerful with larger sample sizes. Third, the proposed test R GPG 
possesses higher powers than T pzz has. T p is failure and it is hard 
to detect the alternatives when the dimension of X is large. It indicates 
that the model-adaptive enhancement of NGLR (R GPG ) is not significantly 
affected by the dimension of the covariate X , but the classic NGLR test 
inevitably and significantly suffers curse of dimensionality even we use the 
bootstrap approximation to help. 

Table [3] about here 

Figure [T] reports the power comparisons when p = 4 and p = 8 for the 
alternative model H\ 2 - Here, n = 100, X r*j N (0, I p ), e ~ t( 5) is considered. 
Plots a) and b) suggest that in the p = 4 and p = 8 case, R GPG has uniformly 
better power performance than ^ z . From plots c) and d), we can see that 
the change of dimensionality has no significant impact on R GPG since our 
test can behave like a local smoothing test as if X were one-dimensional. 
However, T P g Z with p = 4 can be much more powerful than it with p = 8. 

Figure [1] about here 

Study 3: We generate the simulation data from the following regression 
models 


H 31 :Y = /3jX + ai (pJX) 2 + e, 

H 32 :Y = pJX + a 2 {(3jX) 3 + e. 

For both cases, p = 8, ft = ( 0, ■ „ , 0 ,1,..., 1) T /y / pA /?2 = (1,1,, 1 ) T /y/P- 

p/2 

Here, X = (Xi, X 2 , ■ ■ ■, X p ) T are generated from the multivariate normal 
distribution N (0, I p ) where I p is p x p identity matrix. The error term e 
comes from univariate standard normal distribution IV (0,1) or Laplace dis¬ 
tribution Laplace( 0,1) with probability density function f(x) = exp(—|x|)/2. 
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As for the models under H 31 and H 32 , a is set to be a\ = 0, 0.05, 0.1,... , 0.5. 
The simulated sizes and powers with n = 100 for all of the combinations are 
plotted in Figure [2J where the test proposed by Guo et al (2015) is denoted 
as T% wz . 

From Figure El we can see that both the tests R GPG and z have 
acceptable empirical sizes. Further, under all of the situations we conduct, 
as for the model H 31 , from plots a) and b), we can see that R GPG has 
uniformly higher powers than T^ wz , though T^ wz also performs well. 
Under the alternative H 32 , from plots c) and d), we can conclude that R GPG 
makes the comparable performance with z . This study suggests that 

in the limited simulations we conduct, the proposed test can work better 
than z . As T^ wz is also a model-adaptive test, it deserves a further 
study to see whether this is because NGLR can be more powerful than the 
one Zheng’s test (1996), which is left to a future research topic. 

Figure [2] about here 


5 Real data analysis 

A sample of 82 horse mussels are analysed in this section with our pro¬ 
posed test procedure. The data were part of a large ecological study of 
mussels (Cook, 1998; Cook and Weisberg, 1999), which was collected in the 
Marlborough Sounds off the coast of New Zealand. Recently, this dataset 
was used to conduct transformed sufficient dimension by Wang et al (2014). 
Five variables are included in this dataset: the muscle mass Y, the height 
H , the length L, the width W and the mass S of the mussel’s shell, where 
H, L and W are in millimetres and S is in grams. Before analysis, all of 
variables are standardized separately. We are interested in whether the re¬ 
lationship between response Y and covariates X = (H, L, W, S) T is linear, 
if not, which model can be suitable to fit this data? 

Figure[3]presents the scatter plots for response Y and covariates H, L, W, S 
and shows that the relationships between Y and H, L are both curved al¬ 
though the variables W, S may be approximately fitted by linear mean func¬ 
tions. We can make a preliminary inference that it is not appropriate to 
match this dataset with a simple linear regression model. Further, a test 
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is conducted to certify our thought. The p-values for test statistics -R GPG 
and e are both 0.001, which both suggest that we can reject the null 

hypothesis in the statistical sense and verify our initial thought. 

Figure [3] about here 


We further try to implement Yeo-Johnson transformations with the data 
before analysis. Yeo and Johnson (2000) proposed the following transfor¬ 
mation, ';/>(■, •): R x R —>• R, where 




' {(u + l) A —1}/A (u > 0, A / 0), 

log(u + 1) (u > 0, A = 0), 

-{(-u + 1) 2 - a -1}/(2-A) (u<0,A/2), 

„ -log{(-tt + l)} (u< 0,A = 2). 


Denote the transformed response as Y and transformed covariates as (H , L, W, S) T . 
Here, A = 0.3 is considered. Since the response and covariates are all 
positive, only ip(\,u) = {(u + 1) A — 1}/A is enough to transform origi¬ 
nal data. The scatter plots for transformed response Y * and covariates 
X = ( H , L, W, S) T are depicted in Figure[4l which all approximately display 
linear relationship between them. Further, we intend to apply the linear re¬ 
gression Y = X T j3 to fit this dataset and /3 = (0.256,—0.025,0.104, 0.634) T 
are obtained. Our proposed test is employed to check whether this model 
is adequate. The test statistics values for i?^ PG and R AtAVE are —1.697 
and —1.742. The corresponding p-values are 0.090 and 0.082, which both 
indicate that the null hypothesis cannot be rejected provided that the sig¬ 
nificance level a = 0.05 is considered, that is, the linear regression model is 
proper to fit the transformed data. 


Figured] about here 


Appendix. Proofs of theorems 

The following conditions are required for proving the theorems in Sec¬ 
tion 3. 
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(Cl) E\Y\ k < oo, E\\X\\% < oo for all k > 0, E\e\ < oo, E\g(p T X,0)\ < oo, 
E{[m(B T X)-g(p T X,9)} 2 } < oo,E\m{B T X)\ < oo, sup E(Xf\B T X) < 
oo, l = 1,2E(e 2 |.B T X) < oo where e = Y — E(Y\B T X); 
£/(X|y) and £/(XX T |y) have bounded, continuous third order deriva¬ 
tives. 

(C2) The density function fp{z) of /3 T X on support of Z exists and has 
bounded derivatives up to order r, r > 2 for all j3: \/3 — (3\ < 6 where 
5 > 0 and satisfies 

0 < inf f(z) < sup f(z ) < 1. 

(C3) The density function f(X) of X has bounded second derivatives and 
is abounded away from 0 in a neighbor around 0. The density function 
f(Y) of Y has bounded derivative and is bounded away from 0 on a 
compact support. The conditional densities fxiY(') of X given Y and 
f{X 0 ,Xi)\(Y 0 ,Yi) of ( x o, x i) given {Y 0 ,Y t ) are bounded for all l > 1. 

(C4) The conditional mean E(Y\/3 T X = z) has bounded derivatives up to 
order r, r > 2 for all /3: \f3 — /3\ < S where 5 > 0. At the same time, the 
derivatives of link functions g(-) and h(-) up to order 3 are bounded. 

(C5) The kernel function /C(-) is a bounded, derivative and symmetric prob¬ 
ability density function which satisfies the Lipschitz condition and all 
the moments of /C(-) exist. The bandwidth and trimming parameter 
satisfy h oc n" 1 / 5 and e < 1/20 respectively. Besides, y/nh —>• oo and 
nh 1 / 2+2r —> oo. 

(C6) Under the null hypothesis and local alternative hypothesis, nh 2 —>• oo; 
Under the global alternative hypothesis, nh q —>• oo. 

(C7) The matrix E{Vm(B T X)X7m(B T X) T } is positive definite where Vm(-) 
rn 1 (■) denotes the gradient of the function m(-). 

REMARK 3. Condition (Cl), Condition (C3) and Condition (C5) are re¬ 
quired for MAVE. Condition (C2) is necessary for the asymptotic normality 

of our test statistics. The smoothness requirements on the link function in 
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Condition (C4) can be relaxed to the existence of a bounded second-order 
derivative at the cost of more complicated technical proofs and the use of 
smaller bandwidth, which is similar to the conditions in Xia (2006). Con¬ 
ditions (C5) and (Cl) are assumed for OPG. In Condition (C6), nh 2 —>• oo 
is a common assumption in nonparametric estimation. 

We first give the proof of Lemma din Section 3. 

Proof of Lemma d We only prove this lemma for the OPG based estimate. 

The result for MAVE based estimate has been proven by Guo, Wang and 
Zhu (2015). Let G(B T X) = g((5 T X,0) + C n m{B T X). Thus 

VG{B t X) = g\(3 T X, 9)/3 + C n Bm\B T X). 

It then follows that: 

E{VG(B t X)VG(B t X) t } = E{g\(S T X,d) 2 }Pf3 T + 2C n pE{g'0 T X,0)m'{B T X) T }B T 

+C 2 BE{m'{B T X)m'(B T X) T }B T . 

Denote £ = n l bjbj. We have: 

t- E{g'((3 T X,0) 2 }Pf3 T = t-t + t-E{VG(B t X)VG{B t X) t } 

+E{VG{B t X)XG(B t X) T } - E{g'{p T X , 0) 2 }P(i T 

= — — bj)bj + Op{-j= + C n ) = O p (—r= + C n ). 

j =i v v 

Thus, we can get A* — Aj = O p (+ C n ). Note that Ai > 0 and for any 
l > 1, we have A/ = 0. Consequently, under the condition that C n = o(c ) 
and c = o(l), 

A 2 + c A;+i + c _ A 2 + c + Op(C n ) A(_|_i + c + O p (C n ) 

Ai + c A 1 -\- c Ai + c + Op(C n ) A/ + c + Op(C n ) 

c-\-Op{C n ) c + O p (C n ) 

Ai + c + O p (C n ) c + O p {C n ) 

Thus under the local alternative (13.71) . Lemma 1 holds. 
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Further consider the case that under the global alternative (11.31) . We 
have 

E - £{Vm(B T I)Vm(B T I) T } 

= E - S + E - E{Vm{B T X)Xm(B T X) T } 

= ~ it&j - b ^ b J + °p(j=) = °r(j=)- 

j =i 

Therefore, Aj — A* = O p (l/y/n). Note that when l < q, A; > 0 and for any 
l > q, we have A/ = 0. Noticing that c = 1 /\fnh is recommended, thus 
1 /yfn = o(c) and c = o(l). For l > q, 

A g+ i + c A? +1 + c _ A g+ i + c + Q p (-j=) A; + i + c + Q p [-^=) 

Aq + C A / + c 


when 1 < l < q, 

Aq+l + c A/ + l + c 
A<j + c A; + c A g + c + O p (-y^) Xi + c + O p (-j=) 

_ C + °p(^ _ ^+1 + c + Op(^) ^ A ; + 1 

Ag + c + Op () A; + c + Op () A i 

Therefore, we can conclude that under the global alternative (11.31) . q = q 
holds with a probability going to one. The proof is finished. □ 

The following lemmas are used to prove the theorems in Section 3. 

LEMMA 2. Under Conditions (C1)-(C7), we have: 

—RSS\= I a 2 (z) f (z)dz + o v ( 1), 

n J 

where Z = B T X. 

Proof of Lemma 0 The term RSS\/n can be written as 

— RSS\ = —'S^\ei + m(B T Xi) — m(B T Xi)\ 

n n 1 > 


Ag + c + Op(^-) A/ + c + Op(^) 

_ c + _ c + ^ _ 1 
Ag + C+Op(^) C+Op(^) 

Ag+i + c + Op(^) A;+i + c + Op(^) 


26 















1 

n 


J 2 £ i + -Y,{ m ( BTx ^ 


i=l 


2=1 


- rh(B T Xi)}ei + 


1 

n 


Y { m(B T Xi ) 
2—1 


= : Ai + A 2 + A 3 . 


Let Z = B T X. The term A\ has the following representation: 

A\ = E[E(e 2 \Z )] + o p ( 1) = f a 2 (z)f(z)dz + o p (l). (A.2) 
For the term A 3 , it can be easily derived that 

■t n i n 

As < — y \m(B T Xi ) — m(B T Xi )} 2 H— \m(B T Xi ) — m(B T Xi )} 2 

n z — J n z — J 

2=1 2=1 

= : A31+A32. (A. 3 ) 

The result of Hardle et al. (2004) shows that \\B — B\\ = O p (l/y/n). Note 
that Yl?=i ll x *l | 2 = O p (n). Then 


1 1 1 

|A 3 i| < m l2 Cz)\\B - B|| 2 • - ]T \\ Xi \\ 2 + o p (-) = Op(-), (A.4) 

n z ' n n 

2=1 

where z lies between B T Xi and B T Xi. By Theorem 4.8 in Hardle et al. 
(2004), we can derive 

1 n 

|A 32 | < c--Y,{ \rh(B T Xj) - Erh(B T Xj)} 2 + \Em(B T Xj) - m(B T Xj )] 2 1 

2=1 


Combining (1A.3D (|A.4I) and (|A.5I1 yield that 


*=<v±)+<%<A+*»>. 

Similar to A 3 , the term A 2 can be bounded by 

(A. 6 ) 

\A 2 \ < — Y, m(B T Xi) — Em(B T Xj)\ + \Em(B T Xj) 
2=1 

- m(B T Xi) || • | £i\ 

= O p (T= + h r ). 

\nh 

(A.7) 


m(B T Xi )} 2 

(A.l) 


• f(B T Xi ) 
(A.5) 
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Taking the formulae (IA.2I) . (IA.6I) and (IA.7I) into (jA.ll) . as nh —> oo, h —> 0, 
we have RSS\/n converge to a constant in probability: 

—RSSi= l a 2 (z)f(z)dz + o„(l). 
n J 

The proof of Lemma [2] is finished. □ 


LEMMA 3. Denote a = (/?, 9) and let a be a parameter value which min¬ 
imizes E{g((3 T X, 9) — Suppose Conditions (Cl)-(Cl) hold. Then 

1 

' \fn' 


a — a = Op (— j =). 


Proof of Lemma 0. Note that a can be estimated through OLS, that 


is, 


1 

& = - 5 Z {y* - 9(P T xi, 9 )f. 

’ 2—1 


Further, denote g(a) := g(/3 T Xi,9), we can get 

Ayi-9{6t)) 


1 v-^ dg(a ) 

0 = A A 
1=1 

1 -A dg{a ) 


n 


E 

2 — 1 


9a 


XVi - 5(a)) + {- ~ ff( Q i)) 


d 2 g(, 


a) 


2—1 


dada T 


a=a i 


1 ^(a) <%(a) _ 

_AA_ }(«-«)> 


n ^ da 
2—1 


Q:=CK 1 


Q! —Q !1 - 


where a\ lies between a and a. According to the above formula, we can 
obtain that: 


a — a = 


1 A dg(a ) dg(a ) 


{A 

2 — 1 

1 A <9 5 (a) 


. n *—' da 
1=1 


a=a\ da T 


1 


a=a 1 n 

i=l 


“ “ 5 (ai)) 


«9 2 g(a) 


<9a<9a T 


q ;= q;i 


r 


E 


n *—' da 

1=1 


Xyi-g(a)) 


= : Wf 1 W 2 . 


(A.8) 
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It is not difficult to verify that 


Wi = E[g'(a)g'(a) T ] + o p { 1), 


(A.9) 


where </(•) = dg{a)/da denotes the gradient of the function g(-). After¬ 
wards, W 2 can be rewritten as 


W 2 = 


1 dg(a) ( t ^ (~w , 1 dg(a) 

7 L -77T * »(“)) + 7 A 


i=l 


n z —' 9a 

Z=1 


Si. 


Recall the definition of a, we can derive that E(W 2 ) = 0 and Var(W 2 ) = 
O p (l/n). Further, we get 


W 2 = O p 


1 


Combining (IA.8I) . (IA.9I) and (lA.lOll . we can easily conclude that 

a — a = Op(—=). 


(A.10) 


n 


The proof of Lemma [3] is completed. □ 


LEMMA 4. Suppose Conditions (Cl)-(C7) hold and under either Hq or 


H\ n in d 3. with C n —>• 0 as n —>• 00 , we have 


-RSS 1 
n 


1 


° 2 {z)f(z)dz + o p (l), 


where Z = R T A. 


Proof of Lemma [7} Let A(xi) = m(B T xf) — g(/3 T Xi,6), where d = (/I,0) T 
is defined in Lemma El hence, m = e* + m(xi) = e* + A(a: i ) + g(/3 T Xi,9). 
consider the following decomposition: 

[y% ~ m(B(q) T Xi)\ [y t - g((3 T Xi , Q)\ 

= [si + m(B T Xi ) - m(R(£) T Xj)] [e* + A(x*) + g(/3 T Xi, 9) - g0 T x i: §)] 

= ef + EiA(xi) + £i[g(/3 T Xi,9) - g(/3 T Xi,9)\ + Ei[m(B T xf) - rh(B(q) T Xi)\ 
+A (xi)[m(B T Xi) - rh(B(q) T Xi )] 
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(A-11) 


+ [m(B T Xi) - rh(B(q) T Xi )] [g((3 T x u 6 ) - g0 T x it 0)] 

=: A 1 + A 2 + A 3 + A 4 + A 5 + Ag. 

From Lemma [3] (& — a) = O p (l/y/n). Thus, for the term A 3 , we have: 

A 3 = £ig'(a 2 )(a - a) = O p (-^=), (A. 12 ) 

v n 

where a 2 lies between a and cto- 

As for the term A 4 , similarly as the proof of the term A 2 in (lA.7j) . we 
can obtain 

\A 4 \<O p (-^=+h r ), (A.13) 

vn/i 

thus, when n —> 00 , h —>• 0, n/i —>• 00 , we have A 4 = o p ( 1). Further, A 5 = 
o p ( 1) can be gotten as well. 

Combining (IA.12D and (1A.13D . we can conclude A 6 = o p (l). Through 
the above proof, the formula (IA.11D can be rewritten as 

[yi-fh(B(q) T Xi)][yi~g0 T Xi,6)\ = e? + £jA(zj) + o p (l). (A. 14) 

Consider the two cases under the null and alternative hypothesis. 

A). Under the null hypothesis Hq, A (xi) = 0, then for i = 1 ,n, when 
n -> 00 , we have [y t - rh(B(q) T Xi)][yi - g(f3 T x i: 0)] = ef + o p (l) > 0, thus, 
for large n, 

1 _ 1 n 1 „ „ 

—RSSi = - V] m ~ rh{B{q) T Xi)][yi -g0 T Xi,6)\ 
n n z ' I 

2=1 

1 71 S\ * 

= - MB{q) T Xi)][yi - g 0 T x i: 9)\ + o p (l) 

2=1 

1 n r 

= ~^Z £ i +°p( 1 ) = / cr 2 (z)f{z)dz + o p {l). 

j 

B). Under the local alternative H\ n in (13.71) . A(xj) = C n m(B T xi), 

1 — 1 n 

-RSSi = - |ef + C n £ i m{B T Xi)\ + o p (l). 
n n 


30 












Since C n —>• 0 when n —>■ oo, it can be easily obtained that A (xi) —>• 0, 
further we have 

—RSSi = J a 2 (z) f (z)dz + o p (l). 

The proof of Lemma 0] is finished. □ 

Proof of Theorem 0 Under the null hypothesis in (11.11) . we have m(x) = 
g(/3 T x,9). Write the numerator of T n in (12.6|) as H n = ( RSSq — RSSi)/n. 
Then 

- n i n 

H n = -J2[y i -g0 T xJ)f--j:[y i -MB T x l )} 2 
2 = 1 2 — 1 

1 U 

= - ^[ 2 Vi - g0 T Xi, 0) - m(B T Xi)][m(B T Xi) - g0 T Xi, §)} 

2=1 

1 n „ 

= - V] [2 £i + ff(/3 T Xi, 6) - g0 T Xi, 9) + m(xi ) - m(R T Xj)] 

n z —' L 
2=1 

[m(R T Xi) - m(xi) + g{P T x i: 9) - g0 T Xi, 0)\ 

1 72 72 

= -{2 E e,[^) - m(xj)] + 2 s 02 i e i [g(l3 T Xi,9) - g0 T Xi,§)\ 

2=1 2=1 

72 72 

+ 5^[5'(/3 T ^2, <9) - g(P T Xi,9 )] 2 - ^2[m(xi) - m(B T Xi )} 2 } 

2=1 2=1 

— • Si + i ?2 + S 3 — S 4 . (A.15) 

As to the term B 2 , similarly as the proof of A 3 in (IA.12D . we have 
2 n 1 

B 2 = - Y^eigf(a 2 )(ao - a) = O p (~), (A.16) 

n n 

2=1 

where a 2 lies between a and «o- 
For the term B 3 , we have: 

l n 1 

B 3 = (a - a 0 ) T • -y2g'(a 2 ) T g'(a 3 )(a - a 0 ) = O p (~), (A.17) 

n z — J n 

2=1 

where 0:3 is between a and ag. 
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Consider the term B±. Recalling the formula (12.71) . we can obtain 


n n 


r 4 = 


~ 12 { /L, (B)(w ~ m ( Xi 


i =1 3=1 

n n 


;E{E Wij (B)[m(xj) + £j — m(x,)]| 

i=i j =i 

^ n n n ^ n n n 

;EE Wij(B)ej ^ Wik(B)e k + - EE Wij(B)[m(xj ) - m(xj)] ^ w ik (B)e k 

i —1 j =1 /c—1 2=1 j = l 

^ n n n 

+- EE tCij(R)[m(xj) - m(xi)\^2w ik (B)[m(x k ) - m(xi)\ 


n 


k =1 


*=i i=i 

i?41 + £>42 + ^43 • 


k =1 


(A.18) 


For the term R 42 , we first calculate the following element: for any function 
of x, i(x), we have: 


l(xi ) ~ Wij(B)l(xj 

3=1 


^2 Wij (B){l(xi) - l(xj)} 

3=1 

1 


E K 


- Zj 


Xj) - Z(Xj)} 


nhf(zi) V h 


hf{zi) 

1 

f(Zi) 


IC(u){E(l(xi)\zi) — E(l(X)\zi — hu)}f(zi — hu)du 


—h r k r 


(E(l(xi)\zi)f(zi)Y r) - E(l(xi)\zi)f(zi ) (r) 


/(-*) 


= : /AM( 2i ), 


(A.19) 


where u = l? T (xj — x)/h and Taylor expansion is used in the penultimate 
step. Also, denotes the rth derivative. Thus, 


K%i) -'52uH j (B)l(x j ) = ^2wij(B){l(xi) - l(xj)} + Y^[ w ij(B) - Wij(B)]{l(xi) - l(xj)} 

3 = 1 i=i i =1 

= h r M(zi) + o p {h r ). 
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Together with Lemma 3.3b in Zheng (1996) or Lemma 2 in Guo et al (2015), 
it can be easily derived that 


-yyy\w ik {B)e k M{zi)=o p (—=) 

n Jn 

i= 1 k =l v 


(A.20) 


Combing the formula (1A.19D and (IA.20p . we have 

h r 


B 42 = O p (—). 


n 


Similarly, we can obtain 


£43 = O p (h 2r ). 


Turn to the term B A \ in (IA.18I) . 


^ n n ^ n n n 

b ai = -Y.Y1 + -Y.Y, £ i £k Y, wtj(B)'mAB) 


i= 1 1=1 
= -841,1 + 841,2- 


j =1 k^j i=1 


As for B 41 1 , we have: 


1 ~ z j)/h) 2 

B 4 bi = iLL— — — - 6 


n ■' n 2 h 2 f 2 (zi ) J 


1 71 1 1 

= — f a 2 (z)dz f IC 2 (u)du + o p (——). (A. 21 ) 

nh J J nV/i 

Thus, the term R 4 in (|A.18I) can be concluded as 

l nnn 1 /* f h r 

b a = ;££ £ j £ k^2wij(B)wik(B) + — / a 2 (z)dz / K 2 {u)du + O p {—=) 

1=1 *=1 ' ' v 

(A.22) 


1 


+Op(/i 2f ) + o p (—/=). 

nv h 

Recall the term Ri in (IA.15I) . 

~ n n 


Bi = -y^ j £iy^ j w ij {B)[m{x j ) +£j -m(xj)\ 


i=1 1=1 
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2 n n 2 n 

~yZ £ iyZ w ij(B)[m( x j) - m(xi )] + - y~]efwu(B) 

n z —' z —' n z —' 

2=1 J = 1 2=1 


+ ~mi £ i £ o w ^) 

i =i ^ 

= : 611 + 612 + 613 . 


Recall the result in (IA.19D . it can be easily derived that 


For the term 612, 


Bn=O p (^). 


Therefore, 


Bl = -J2y2 £i£ i Wi i( B ^ 


_ 2/C(0) r - lf _2 1 i 2/C(0) 

12 nh * f(zi ) ^ nh 

2 /C( 0 ) 


<r 2 (2:)cfe + o p (—^=). (A.23) 


n 


yfh 


i=l i^j 


nh 


cr 2 (z)dz+o p (—y=)+O p (-^=). (A.24) 
nV/i V n 


Taking formulae (IA.24D , (IA.16I) , (IA.17I) and (IA.22D into the formula (IA.15I) , 
we can obtain 


H n = 


1 


n n 


n 


- - ;EE £ - £j{i%(6) + 1/^(6) -^^(5)^(5)} 

n . fc=i 


+ 


*=1 i+j 

2Q 1 f cr 2 (z)f(z)dz , 1 

nyfh 


nh +°f(—) 


2Q 1 Ja 2 (z)f(z)dz 1 

- Bl3 “ B4, ’ 2 + - ~ h - + “>' ( wS > ’ 


where 


2 Qi = | 2 /C( 0 ) - J /C 2 (n)dn} j 

Considering the term 613 — -641,2; 


/ a 2 (z)dz 
cr 2 {z)f{z)dz' 


(A.25) 

(A.26) 


-£>13 — 641,2 — AE'i <^■{7% (6) + Wji(B) - y^ j w ki (B)w kj (B)} 


i=l ijtj 


k =1 
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+-y^y^£igj|(wij(.B) - Wij(B )) + (wji(B) - Wji(B)) 
i =1 

n 

+ ^2(w k i(B)w k j{B) - u>ki{B)w k j(B ))| 


fe=i 

= : Z?i + U2 = D\ + Op(.Di). 

The last equation holds due to the consistency of B to B. 
Consequently, H n can be rewritten as: 

tt n , 2Q 1 Ja 2 (z)f(z)dz | /ri N , , 1 

H n —: -Di H- -7 - 1 -Op(-Di) + o p ( 


(A.27) 


rah 


n 


\/h 


(A.28) 


For the term Di in f)A.28|) . based on Whittle (1964), Jong (1987) and 
Dette (2000), we can similarly prove that 


nVh( H r , — 


2Qi f a 2 (z)f(z)dz 


nh 


N{ 0,e§). 


Recall the expression of statistic T n in (12. 6 p and Lemma El according to 
Slutsky’s Theorem, we can obtain 


hi T n - 


q nVh(^H n - (2Q 1 /nh) f cr 2 (z)f(z)dz ) 


where 
Ro = 


Vo 


4(J cr 2 (z)f(z)dz) 2 


2RSS\/n 

f cr 4 (z)dz f[21C(u) — 1C* lC[u)] 2 du 
2 (f cr 2 (z)f(z)dz) 2 


N(0,V o ), 


(A.29) 


and 1C* 1C denotes the convolution of 1C and 1C. 
The proof of Theorem Q] is finished. □ 


Proof of Theorem [H Under the null hypothesis in (11.11) . m(x) = g((3 T x,d) 
holds and from Lemma 01 we have RSSi/n = Yll % =\[Vi ~ m(B(q) T Xi)][yi — 
g(/3 T Xi,6)\/n + o p (l). As for the numerator of the statistic T n in ( 12 . 111 ) . 
denote J n = (RSSo — RSSi)/n, then we have 

1 n ^ 

J n = ~ XI [MB{q) T Xi) ~ 90 T Xi, 6 )] [y* - y(/3 T Xj, §)] + o p ( 1 ) 

i =1 
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_| lull/ I u 

- V [y ^Wjj(B)m(xj) - m(B T Xi ) + g(/3 T Xi,9 ) — ^(/3 T cci, + y~]u 
*=i jy* jy* 

x [g(/3 T £j, 0) + £j - g0 T Xi, 0)] + Op( 1) 

1 
n 


n n 


- ^2 [ ^2wij(B)m( x j) - m(B T Xi ) [5(/3 T Xj,6») +£j - 5(/3 T Xj,0)] 

n *=i A 


+-Y][c/(/3 T Xi,6') - g((/3 T a;i,0)][5f(/3 T Xi,6i) + e* - g(P T xu 6)\ 


i= 1 


H—y][ff(/3 T ^,6i) + £i ~ g0 T Xi,6)}'22' l]: ’ij(^) £ j +°p( 1 ) 
n i=i 

— • Jnl Jn2 ”t" «^n3 d” ^p(l)- 


For the first term J n i, similarly as the proof of (IA. 191) , it is not difficult 


to obtain that Y^j^i'u j ij{B)m(xj) ~ m(B T Xi) = O p (h r ), further we have 

^ n n 

Jn i = -y: [y^ '^j(-S)^(^i) - [g(/3 T Xj,g) -g(/3 T Xj,9)\ 

i =1 i±i 

^ n n 

~~ F! g 4 F! Wij(B)m(xj) - m(B T Xi) 


+- 


* =1 jA* 

= 0 P ( A x O p (V) x O p (-l) + 0 P ( A x O p (V) 
V 71 v n v^ 

h r 

= o P ( 4 )- 


n 


As to the second term J n 2 , just the same as the terms -B 2 in (1 A. 161) and 
B 3 in (IA.17D . we can obtain J n 2 = O p (l/n). 

Turn to the term J n 3 , we first define the following term: 


1 w !Ch { BT ( Xi ~ a: j)} £ jg / (Q ; 2)(«o ~ «) 

/(®i) 

1 K-h{B^ (Xi — Xj)}£i£j 

f(xi ) 

= ^3,l + ^n3,2) 


where ICh(-) = JC(-/h)/h and £*2 lies between ao and a. It is not difficult 
to derive that J n 3 = J * 3 + o p (J* 3 ). As to the term J * 3 x , we can conclude 


’ij(B)sj 
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that J * 31 = o p (l/n). Note that J* 3 2 is a degenerate U-statistic. Following 
Zheng (1996), it can be easily obtained that nh 1 ^ 2 Jf 32 => N( 0, Si) with 
S 1 = 2//C 2 [u)du J cr 4 (z)dz. Under the condition nh 1//2+2r => 0 , we have 
n/i 1 / 2 J n \ => 0. Further it can be concluded that nh l ^ 2 J n => 1V(0, Si). Based 
on Lemma [I] and through Slutsky’s Theorem, Theorem [2] can be obtained. 

The proof of Theorem [2] is completed. □ 

Proof of Theorem 0 We first consider the global alternative (|1.2D . Under 
this alternative, Remark |T] shows that q —>• q > 1. From Lemma El a 
is a root-n consistent estimate of a which is different from the true value 
«o under the null hypothesis. Let A(xj) = m(B T xf) — g(/3 T Xi,9). Then 
Hi = £i + m(xi ) = £i + A(xi) + g(/3 T Xi, 0). Similar to the proof of Theorem 
1, it is then easy to see that H n = (RSSo—RSS±)/n => E(A 2 (X)). Together 
with Lemma El we can obtain that T n /n —>• Constant > 0 in probability. 

Thus VhT n = Vhn x T n /n oo. 

Under the local alternative hypothesis in (13.71) . also denote H n = (RSSo— 

RSSi)/n and we have 

-t n i n 

Hn = ~ - 90 T Xi, 0 )] 2 -VV - rh(B T Xi )] 2 

n z —' n z —' 

i=i i=i 

1 n 

= -^2 \- 2yi ~ y (P Tx i> #) ~ rh(B T Xi)\ [m(B T Xi ) - g0 T x it 0)] 

n i =l 
1 n 

= - Y [2 £i + m(xi ) - m(B T Xi ) + g0 T Xi, 9) + C n m(B T xf) - g0 T Xi, 9)\ 

1=1 

[rh(B T Xi) - m(xi ) + g((3 T Xi, 6) + C n m(B T Xi) - g0 T Xi, 9)\ 

2 n 2 n 2 (~j n 

= - Xi) - m(xi)\ + - V'e ii [s'(/F r x;,6') - g0 T Xi,9)\ H-- V'e i m(R T Xi) 

n z —' n z —' n z —' 

i= 1 i=l i=l 

-t n 1 n 

4— Q) + C n m(B T Xi) - g0 T Xi, 9)} 2 -^[m(R T Xj) - m(xi )] 2 

n i=i n i=i 

= : H n i + H n 2 + H n 3 + H n 4 — Hn 5 . (A.30) 

From Lemma El a — a = O p (l/y/n). Thus similar to the derivation of 
B 2 , it is not difficult to verify that 

nVhH n 2 = nVh x O p (—) = o p ( 1 ). 

n 
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For H n 3 , since E{em(B T X)} = 0 and Var{em(B T X)} —>• c, where c is 
a constant, when C n = we have 

(j 

nVhH n3 = nVh x 0 p (—^=) = o p (l). 


n 


Turn to the term H n 4, 
C 2 n 


n 


B n 4 — 


n 


^ 2 m 2 {B 


1=1 


2(7 

T Xj)-- y'm(5 T rc i )[5'(/3 T Xj,0) - g0 T Xi,( 

n z ' 

2=1 


+- (?) - ff(/3 T Xi, 0)] 2 

n z —' 

2=1 

= H n 41 2 FZ n 42 + S71 43. 

For the term H n , 41 , when C n = n -1 / 2 /!^ 1 / 4 , 

nVhH n4) i = nV/iC 2 x S[m 2 (S T X)] + o p (l) = S[m 2 (S T X)] + o p (l). 


With regard to the term H n 4 ^, 

(j 

nVhH n ^2 = nv^ x O p (—jL) = o p (l). 

y/n 

Similar to the formula (1A.17|) . we have 

nVhH n 43 = n\//i x O p (—) = o„( 1 ). 

n 

According to the above three formulae, we can derive that 
nVhH ni = E[m 2 (B T X)\ + o p (l). 

It is easy to verify that the terms H n 1 and H n 5 in ()A.30I) are the same 
as the terms Si and S 4 in (lA.15p . respectively. Thus similar to the proof of 
Theorem [T] under the local alternative hypothesis (13. 71) . we can derive that 

^(Tn-^^NfauVo), 

where 

S[m 2 (S T A)] 

/il ^ j ^ 2 {z)f{z)dz' 
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and Vo has been defined in (1A.29I) . 
Theorem [3] is proved. □ 


Proof of Theorem Consider the global alternative in (11,211 first. Denote 
A (xi) = m(B T Xi)-g(/3 T Xi,9 ), thus, y* = m(xi) + Ei = Afjxf) + g(/3 T Xi, 9) + 
£j. We first consider the denominator RSSi in the statistic T n in (12.111) . 
Based on the formula (1A.14D . it can be obtained that 

I _ i n 

—RSSi = — ^ ] |£{ + £jA(xi)| + Op (1). 
n n 

i =1 

Note the conditions in Appendix, suppose E\e\ < oo, E\g((3 T X,9)\ < oo 
and E\m(B T X)\ < oo, we have 

RSSi In =>• ATi > 0, (A.31) 


where K\ is a positive constant. Turn to the numerator of statistic T n , 
denote J n = (RSSo — RSSi)/n. In order to remove the absolute value sign of 
RSSi, we consider every term in it as follows. For i = 1,2,..., n, assume that 
there are m(m < n) terms which satisfy [y* — m(B(q) T Xj)][yj — g($ T Xi, 9)\ > 
0 . As to these m terms, we have 


-{RSS 0 -RSSi} {m} 

n L J 

- V [yi ~ g0 T Xi, Q)\ \rh(B(q) T Xi) - g0 T x u §)] + o p ( 1) 

{m} 

- y][A(xi) + Ei + g0 T Xi, 0) - g0 T Xi, 0)] 

n z ^ 


{m} 


x [rh(B(q) T Xi) - m{B J Xi) + A(x;) + g0 T x t , 9) - g0 T Xi, 9)] + o p (l) 
— A 2 (xj) + o p (l). (A.32) 


{m} 


For another n — m terms which have [y* — rh(B(q) T Xi)][yi — g((3 T Xi, 9)\ < 0, 
thus, 

i{RSS 0 - RSSi} {n _ m} 
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1 

n 


22 [yi- 90 T Xi,O)] 


2 yi 


{n—m} 


fh(B(q) T Xi) - g(fi' Xi,9) +o p ( 1) 


T 


= - y][A(xj) + £i + g((3 T Xi, 0) - g0 T Xi, 0)] 

n 

{m} 

x [ m(B T Xi ) - m(B(q) T Xi ) + A(xj) + g0 T x i: 0) - g0 T Xi , 0) + 2e* 

= — ^ [A 2 (xj) + 2e?] + o p (l). 

n {n-m} 


+ °p(1) 

(A.33) 


Combining (IA.32I) and (IA.33D . under the conditions in Appendix, it can be 
derived that 

Jn = i{RSS 0 - RSSi} {m} + i{RSS 0 - RSSi} {ri _ m} => K 2 , (A.34) 

where K 2 is a positive constant. Based on (1A.31() and (1 A. 3411 . we can obtain 
that T n /n = J n /(RSSi/ra) —>• K 2 /K 1 = C\ > 0 in probability, where C\ is a 
positive constant. Further VhT n = ny/h x T n /n —> 00 , which completes the 
proof under global alternative hypothesis. 


Under the local alternative hypothesis in 113.71) . from Lemma 01 we have 
RSSi/n = Yli=i[Vi ~ MB(q) T Xi)][yi - g0 T x t , 9)}/n + o p ( 1). Let J n = 
(RSSq — RSSi)/n and we have 


Jn 


I 

- 22 [MB{q) T Xi) - g0 T Xi, §)] [yi - g0 T Xi, 9)\ + o p (l) 

i =1 

j n n 

+ e j} “ +5(/3 T ^#) - + C' n m(R T Xj) 

i=r j^i 

x [ 5 (^ T Xj, (9) + £i + C n m(B T Xi ) - g0 T Xi, (9)] + o p (l) 

~22 \ 21 + e ii ~ m (^ T; u) + (9) - g0 T Xi, 9) 

.7/2 

SJ n 

x[g0 T Xi,9) +£i- g0 T Xi,§)\ + -A m(ff T Xj)[ff(/3 T Zj,0) + e* - g(/3 T Xi,§)\ 

i=l 



n 


n 


n 


22 m ( B ^ x i ) [E w;ij(R){m(xj) + £j} 
i=l j^i 


m(B T Xi) + g0 T Xi, 9) - g0 T x^ 9) 


40 
















s~i 2 n 

H —~ E ™ 2 (-B T ®i) + Op(l) 

n z —' 

= : Ini + In2 + In3 + ^n4 + O p (l). 

For the term I„i, from the proof of Theorem [2l it can be obtained that 
nh l / 2 I n i => iV(0, Si). Due to the fact that E(e\X) = 0 and the root-n 
consistency of a = ((3,0) T to a = ((3, d) T , it can be not difficult shown that 
I n 2 = O p (C n /y/n). Thus, when C n = n -1 / 2 /!” 1 / 4 , nh 1 / 2 /^ => 0. As to 
the term I n 4 with C n = n - 1 / 2 /i -1 / 4 , we have nh l / 2 I n 4 => E{m 2 (B T X)}. 

Finally, consider the term I n 3, 

^ n n n n 

I n 3 = Xj)[y^/Wij(B)m(xj) - m(B T Xj ) 

i=l *=1 j^i 

r n 

H—- E 0) - IKE®*, Q)\ 

i =1 

= : In3,l + In3,2 + I n 3,3- 

From Zheng (1996), it can be known that I n 3,1 = O p {C n /y/n). For I n 3 ^, 
from the proof of Theorem O it can be derived that I n 3,2 = O p (C n h r ). 

Finally due to the fact that a = (/3, 0) T is root-n consistent estimate of a = 

0,9) t , we can conclude that I n 3,3 = O p (C n /y/n). With C n = n - 1 / 2 /i _1 / 4 
and condition (C5), we can get that nhpl 2 I n 3 = o p (l). Thus based on 
Lemma 0]and using Slutsky’s Theorem, Theorem [His obtained. 

The proof of Theorem [His finished. □ 
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Table 1: Empirical sizes and powers of S GPG , R GPG and S^ AVE , R^ AVE for H 0 
v.s. i /11 and i /12 at significance level a = 0.05 with p = 8. 



£ 

a. 


n 

= 100 



n 

= 200 


nOPG 

r>OPG 

qMAVE 

fyMAVE 

qOPG 

r>OPG 

^ 'n 

qMAVE 

f)MAVE 

Hu 

e~lV( 0,1) 

0 

0.045 

0.047 

0.046 

0.051 

0.043 

0.051 

0.043 

0.052 



0.1 

0.062 

0.069 

0.055 

0.064 

0.072 

0.085 

0.080 

0.086 



0.2 

0.160 

0.175 

0.175 

0.179 

0.314 

0.340 

0.308 

0.343 



0.3 

0.376 

0.428 

0.416 

0.439 

0.726 

0.762 

0.712 

0.756 



0.4 

0.695 

0.728 

0.712 

0.731 

0.966 

0.971 

0.967 

0.979 



0.5 

0.899 

0.917 

0.909 

0.918 

0.998 

0.999 

1.000 

1.000 


£ ~ t(5) 

0 

0.060 

0.045 

0.065 

0.046 

0.059 

0.048 

0.059 

0.046 



0.1 

0.077 

0.073 

0.079 

0.075 

0.083 

0.077 

0.086 

0.084 



0.2 

0.143 

0.141 

0.144 

0.142 

0.211 

0.216 

0.220 

0.226 



0.3 

0.268 

0.265 

0.276 

0.287 

0.467 

0.512 

0.449 

0.510 



0.4 

0.449 

0.474 

0.486 

0.489 

0.780 

0.817 

0.790 

0.807 



0.5 

0.697 

0.728 

0.700 

0.721 

0.949 

0.966 

0.950 

0.976 

H 12 

e~iV(0,1) 

0 

0.053 

0.049 

0.040 

0.047 

0.046 

0.048 

0.055 

0.053 



0.1 

0.132 

0.135 

0.128 

0.125 

0.213 

0.232 

0.221 

0.217 



0.2 

0.532 

0.545 

0.548 

0.543 

0.879 

0.887 

0.870 

0.866 



0.3 

0.918 

0.928 

0.916 

0.915 

0.998 

0.999 

0.999 

0.999 



0.4 

0.996 

0.998 

0.995 

0.996 

1.000 

1.000 

1.000 

1.000 



0.5 

1.000 

1.000 

0.999 

1.000 

1.000 

1.000 

1.000 

1.000 


£ ~ t(5) 

0 

0.061 

0.054 

0.078 

0.055 

0.061 

0.055 

0.063 

0.048 



0.1 

0.117 

0.100 

0.126 

0.103 

0.147 

0.149 

0.156 

0.152 



0.2 

0.385 

0.377 

0.376 

0.371 

0.590 

0.638 

0.624 

0.649 



0.3 

0.717 

0.727 

0.730 

0.727 

0.958 

0.966 

0.964 

0.965 



0.4 

0.943 

0.948 

0.937 

0.938 

0.999 

0.999 

0.998 

0.999 



0.5 

0.992 

0.992 

0.993 

0.999 

1.000 

1.000 

1.000 

1.000 
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Table 2: Empirical sizes and powers of S GPG , R GPG and S™ AVE , R™ AVE for H 0 
v.s. Hi 3 and Hu at significance level a = 0.05 with p = 8. 



£ 

a. ■ 


n 

= 100 



n 

= 200 


qOPG 

r>OPG 

qMAVE 

t>MAVE 

qOPG 

bOPG 

qMAVE 

f~>M AVE 

#13 

e~N( 0,1) 

0 

0.043 

0.048 

0.042 

0.046 

0.036 

0.046 

0.042 

0.053 



0.2 

0.088 

0.090 

0.091 

0.089 

0.144 

0.146 

0.131 

0.139 



0.4 

0.300 

0.308 

0.305 

0.297 

0.617 

0.635 

0.636 

0.648 



0.6 

0.677 

0.683 

0.697 

0.694 

0.970 

0.974 

0.974 

0.977 



0.8 

0.927 

0.928 

0.921 

0.922 

0.999 

0.999 

0.999 

1.000 



1.0 

0.989 

0.990 

0.992 

0.993 

1.000 

1.000 

1.000 

1.000 


e ~ t(5) 

0 

0.062 

0.051 

0.071 

0.054 

0.055 

0.047 

0.061 

0.047 



0.2 

0.090 

0.087 

0.098 

0.087 

0.106 

0.094 

0.122 

0.107 



0.4 

0.194 

0.191 

0.218 

0.216 

0.375 

0.392 

0.402 

0.417 



0.6 

0.445 

0.445 

0.458 

0.458 

0.800 

0.813 

0.794 

0.830 



0.8 

0.700 

0.710 

0.727 

0.735 

0.962 

0.968 

0.975 

0.979 



1.0 

0.892 

0.901 

0.889 

0.902 

0.998 

0.997 

0.996 

0.997 

#14 

e~iV(0,1) 

0 

0.060 

0.049 

0.058 

0.048 

0.059 

0.054 

0.045 

0.053 



0.2 

0.068 

0.066 

0.074 

0.082 

0.090 

0.107 

0.092 

0.106 



0.4 

0.201 

0.210 

0.197 

0.212 

0.416 

0.446 

0.443 

0.458 



0.6 

0.448 

0.488 

0.500 

0.501 

0.885 

0.879 

0.895 

0.892 



0.8 

0.768 

0.769 

0.797 

0.798 

0.993 

0.988 

0.997 

0.995 



1.0 

0.920 

0.931 

0.946 

0.943 

1.000 

1.000 

1.000 

1.000 


£ ~ t(5) 

0 

0.081 

0.046 

0.070 

0.045 

0.073 

0.049 

0.064 

0.054 



0.2 

0.103 

0.063 

0.104 

0.075 

0.090 

0.078 

0.090 

0.071 



0.4 

0.152 

0.153 

0.164 

0.158 

0.240 

0.246 

0.261 

0.265 



0.6 

0.309 

0.318 

0.318 

0.330 

0.598 

0.643 

0.640 

0.666 



0.8 

0.507 

0.534 

0.534 

0.556 

0.893 

0.900 

0.922 

0.929 



1.0 

0.719 

0.759 

0.761 

0.782 

0.987 

0.987 

0.989 

0.990 
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Table 3: Empirical sizes and powers of R GPG and . T r ^f 3 z for H 0 v.s. H 2 i 
at significance level a = 0.05 with p = 4. 




X 

a. ■ 


n = 100 



n = 200 


f)OPG 

rpF Z Z 
n,A 

rriF Z Z 

1 n,B 

f>OPG 

I'n 

rriF Z Z 
n,A 

rpF Z Z 
1 n,B 

«21 

X - 

- JV(0,Ei) 

0 

0.054 

0.069 

0.051 

0.051 

0.067 

0.046 




0.2 

0.096 

0.073 

0.061 

0.111 

0.068 

0.061 




0.4 

0.234 

0.074 

0.086 

0.387 

0.069 

0.119 




0.6 

0.451 

0.074 

0.143 

0.778 

0.077 

0.223 




0.8 

0.729 

0.082 

0.247 

0.968 

0.078 

0.435 




1.0 

0.882 

0.083 

0.359 

0.998 

0.079 

0.666 


X - 

- JV(0,E 2 ) 

0 

0.050 

0.077 

0.049 

0.046 

0.068 

0.050 




0.2 

0.086 

0.078 

0.062 

0.110 

0.072 

0.078 




0.4 

0.246 

0.082 

0.117 

0.502 

0.073 

0.144 




0.6 

0.575 

0.083 

0.193 

0.892 

0.075 

0.354 




0.8 

0.818 

0.085 

0.334 

0.987 

0.076 

0.665 




1.0 

0.946 

0.089 

0.510 

1.000 

0.077 

0.878 

H 2 2 

X r- 

- JV(0,Ei) 

0 

0.049 

0.067 

0.048 

0.048 

0.071 

0.047 




0.3 

0.139 

0.073 

0.059 

0.210 

0.072 

0.058 




0.6 

0.474 

0.074 

0.074 

0.750 

0.073 

0.086 




0.9 

0.818 

0.077 

0.094 

0.991 

0.074 

0.133 




1.2 

0.957 

0.078 

0.147 

0.999 

0.076 

0.252 




1.5 

0.993 

0.083 

0.221 

1.000 

0.083 

0.443 


X - 

-1V(0,E 2 ) 

0 

0.051 

0.077 

0.048 

0.048 

0.068 

0.047 




0.3 

0.076 

0.084 

0.065 

0.102 

0.070 

0.048 




0.6 

0.210 

0.085 

0.073 

0.363 

0.072 

0.099 




0.9 

0.457 

0.086 

0.113 

0.782 

0.075 

0.156 




1.2 

0.754 

0.089 

0.177 

0.987 

0.077 

0.300 




1.5 

0.925 

0.090 

0.265 

1.000 

0.078 

0.484 
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Figure 1: Empirical sizes and powers of R GPG and TjfJP for H 0 v.s. H 2 2 at 
significance level a = 0.05 with n = 100, X ~ N(0,I p ), e ~ f(5). In four plots, 
the solid line and the dash line are for R GPG and T^ z with p = 4, respectively. 
The solid line marked with “o” and the dash line marked with “+” is for R GPG and 
Tfr with P = 8, respectively. 
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a). Powers for i? GPG and T^ z under i/31 with e ~ N(0, 1 ) b). Powers for i^ PG and T^ wz under i/31 with e ~ DE( 0 , 1 ) 




c). Powers for i? GPG and T^ wz under i /32 with e ~ AT(0,1) d). Powers for i? GPG and T^ wz under i /32 with e ~ D.E(0,1) 




Figure 2: Empirical sizes and powers of R GPG and T^ wz for Hq v.s. U 31 and 
H 32 at significance level a = 0.05 with n = 100, p = 8. In four plots, the solid line 
and the dash line are for Rn PG and T ^ wz , respectively. 
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a). The scatter plot for Y and H 
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b). The scatter plot for Y and L 
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c). The scatter plot for Y and W 
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d). The scatter plot for Y and S 
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Figure 3: The scatter plots for original response Y and covariate H, L,W, S, re¬ 
spectively. 
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a). The scatter plot for Y* and H* b). The scatter plot for Y* and L* 
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c). The scatter plot for Y* and W* 
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d). The scatter plot for Y* and S* 
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Figure 4: The scatter plots for transformed response Y* and covariate 
H*, L*, W*, S*, respectively. 
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